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Abstract. We define a new decidable logic for expressing and checking invari- 
ants of programs that manipulate dynamically-allocated objects via pointers and 
destructive pointer updates. The main feature of this logic is the ability to limit 
the neighborhood of a node that is reachable via a regular expression from a des- 
ignated node. The logic is closed under boolean operations (entailment, negation) 
and has a finite model property. The key technical result is the proof of decidabil- 
ity. 

We show how to express precondition, postconditions, and loop invariants for 
some interesting programs. It is also possible to express properties such as dis- 
jointness of data- structures, and low-level heap mutations. Moreover, our logic 
can express properties of arbitrary data- structures and of an arbitrary number 
of pointer fields. The latter provides a way to naturally specify postconditions 
that relate the fields on entry to a procedure to the fields on exit. Therefore, it is 
possible to use the logic to automatically prove partial correctness of programs 
performing low-level heap mutations. 



1 Introduction 

The automatic verification of programs with dynamic memory allocation and pointer 
manipulation is a challenging problem. In fact, due to dynamic memory allocation and 
destructive updates of pointer-valued fields, the program memory can be of arbitrary 
size and structure. This requires the ability to reason about a potentially infinite number 
of memory (graph) structures, even for programming languages that have good capabil- 
ities for data abstraction. Usually abstract-datatype operations are implemented using 
loops, procedure calls, and sequences of low-level pointer manipulations; consequently, 
it is hard to prove that a data-structure invariant is reestablished once a sequence of op- 
erations is finished [19]. 

To tackle the verification problem of such complex programs, several approaches 
emerged in the last few years with different expressive powers and levels of automation, 
including works based on abstract interpretation [27, 34, 31], logic -based reasoning [23, 
32], and automata-based techniques [24, 28, 5]. An important issue is the definition of a 
formalism that (1) allows us to express relevant properties (invariants) of various kinds 
of linked data-structures, and (2) has the closure and decidability features needed for 
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automated verification. The aim of this paper is to study such a formaUsm based on 
logics over arbitrary graph structures, and to find a balance between expressiveness, 
decidability and complexity. 

Reachability is a crucial notion for reasoning about Unked data-structures. For in- 
stance, to establish that a memory configuration contains no garbage elements, we must 
show that every element is reachable from some program variable. Other examples of 
properties that involve reachabiUty are (1) the acyclicity of data- structure fragments, 
i.e., every element reachable from node u carmot reach u, (2) the property that a data- 
structure traversal terminates, e.g., there is a path from a node to a sink-node of the 
data-structure, (3) the property that, for programs with procedure calls when references 
are passed as arguments, elements that are not reachable from a formal parameter are 
not modified. 

A natural formalism to specify properties involving reachability is the first-order 
logic over graph structures with transitive closure. Unfortunately, even simple decidable 
fragments of first-order logic become undecidable when transitive closure is added [13, 
21]. 

In this paper, we propose a logic that can be seen as a fragment of the first-order 
logic with transitive closure. Our logic is (1) simple and natural to use, (2) expressive 
enough to cover important properties of a wide class of arbitrary linked data-structures, 
and (3) allows for algorithmic modular verification using programmer's specified loop- 
invariants and procedure's specifications. 

Alternatively, our logic can be seen as a propositional logic with atomic proposition 
modelling reachability between heap objects pointed-to by program variables and other 
heap objects with certain properties. The properties are specified using patterns that 
limit the neighborhood of an object. For example, in a doubly hnked list, a pattern says 
that if an object v has an an emanating forward pointer that leads to an object w, then 
w has a backward pointer into v. 

The contributions of this paper can be summarized as follows: 

- We define the Logic of Reachable Patterns (LRP) where reachability constraints 
such as those mentioned above can be used. Patterns in such constraints are defined 
by quantifier-free first-order formulas over graph structures and sets of access paths 
are defined by regular expressions. 

- We show that LRP has a finite-model property, i.e., every satisfiable formula has a 
finite model. Therefore, invalid formulas are always falsified by a finite store. 

- We prove that the logic LRP is, unfortunately, undecidable. 

- We define a suitable restriction on the patterns leading to a fragment of LRP called 
LRP 2. 

- We prove that the satisfiability (and vahdity) problem is decidable. The fragment 
LRP2 is the main technical result of the paper and the decidabiUty proof is non- 
trivial. The main idea is to show that every satisfiable LRP2 formula is also satisfied 
by a tree-hke graph. Thus, even though LRP2 expresses properties of arbitrary data- 
structures, because the logic is limited enough, a formula that is satisfied on an 
arbitrary graph is also satisfied on a tree-like graph. Therefore, it is possible to 
answer satisfiabihty (and validity) queries for LRP2 using a decision procedure for 
monadic second-order logic (MSO) on trees. 



- We show that despite the restriction on patterns we introduce, the logic LRP2 is 
still expressive enough for use in program verification: various important data- 
structures, and loop invariants concerning their manipulation, are in fact definable 
vaLRP-2,. 

The new logic LRP^ forms a basis of the verification framework for programs with 
pointer manipulation [37], which has important advantages w.r.t. existing ones. For 
instance, in contrast to decidable logics that restrict the graphs of interest (such as 
monadic second-order logic on trees), our logic allows arbitrary graphs with an arbi- 
trary number of fields. We show that this is very useful even for verifying programs 
that manipulate singly-linked lists in order to express postcondition and loop invariants 
that relate the input and the output state. Moreover, our logic strictly generalizes the 
decidable logic in |3J, which inspired our work. Therefore, it can be shown that certain 
heap abstractions including [16, 33] can be expressed using LRP2 formulas. 

The rest of the paper is organized as follows: Section 2 defines the syntax and the 
semantics of LRP, and shows that it has a finite model property, and that LRP is unde- 
cidable; Section 3 defines the fragment LRP2, and demonstrates the expressiveness of 
LRP2 on several examples; Section 4 describes the main ideas of the decidability proof 
for LRP2', Section 5 discusses the limitations and the extensions of the new logics; fi- 
nally. Section 6 discusses the related work. The full version of the paper [36] contains 
the formal definition of the semantics of LRP and proofs. 

2 The LRP Logic 

In this section, we define the syntax and the semantics of our logic. For simplicity, 
we explain the material in terms of expressing properties of heaps. However, our logic 
can actually model properties of arbitrary directed graphs. Still, the logic is powerful 
enough to express the property that a graph denotes a heap. 

2.1 Syntax of L/?P 

LRP is a propositional logic over reachability constraints. That is, an LRP formula is a 
boolean combination of closed formulas in first-order logic with transitive closure that 
satisfy certain syntactic restrictions. 

Let T = {C,U,F) denote a vocabulary, where (i) C is a finite set of constant sym- 
bols usually denoting designated objects in the heap, pointed to by program variables; 
(ii) f/ is a set of unary relation symbols denoting properties, e.g., color of a node in a 
Red-Black tree; (ii) F is a finite set of binary relation symbols (edges) usually denoting 
pointer fields.^ 

A term t is either a variable or a constant c G C. An atomic formula is an equality 

t = t', a unary relation u{t), or an edge formula t t', where f G F, and t, t' are 
terms. A quantifier-free formula il>(j![), . . . , w„) over r and variables vq, . . . ,Vn is an 
arbitrary boolean combination of atomic formulas. Let FV{ip) denote the free variables 
of the formula tjj. 



' We can also allow auxiliary constants and fields including abstract fields [8]. 



Definition 1. Let ip be a conjunction of edge formulas of the form Vi vj, where 

f & F and < i,j < n. The Gaifman graph ofijj, denoted by B^, is an undirected 

graph with a vertex for each free variable of tp. There is an arc between the vertices 

f 

corresponding to Vi and vj in if and only if{vi vj ) appears in ^, for some f € F. 
The distance between logical variables Vi and vj in the formula tjj is the minimal edge 
distance between the corresponding vertices Vi and vj in B^. 

f f 

For example, for the formula ip — {vq ^ vi) A {vq V2) the distance between vi and 
W2 in V' is 2, and its underlying graph B^ looks like this: Vi — vq — V2. 

Definition 2. (Syntax of LRP) A neighborhood formula N{vo, ■ ■ ■ ,Vn) is a conjunc- 

f 

tion of edge formulas of the form Vi vj, where f € F andO < i,j < n. 
A routing expression is an extended regular expression, defined as follows: 



i? ::= empty set 

I e empty path 

I ^ f & F forward along edge 
f 

I <— f & F backward along edge 

\ u u E U test if u holds 

I -lU u E U test if u does not hold 

I c cgC test ifc holds 

I -ic c G C test ifc does not hold 

I R1.R2 concatenation 

I Ri\R2 union 

I R* Kleene star 

A routing expression can require that a path traverse some edges backwards. A routing 
expression has the ability to test presence and absence of certain unary relations and 
constants along the path. 

A reachability constraint is a closed formula of the form: 

Vvq, Vn.R{c, Vo) {N{vo, ...,Vn)^ ll>{vo, • • • , Vn)) 

where c G C is a constant, R is a routing expression, N is a neighborhood formula, 
and "0 is an arbitrary quantifier-free formula, such that FV{N) C {wq, . . . , w„} and 
FV{tp) C FV{N) U {vo}. In particular, if the neighborhood formula N is true (the 
empty conjunction), then tp is a formula with a single free variable vq. 
An LRP formula is a boolean combination of reachability constraints. 

The subformula N{vq, . . . , w„) =^ iJj{vq, . . . , w„) defines a pattern, denoted by p{vo). 
Here, the designated variable vq denotes a "central" node of the "neighborhood" reach- 
able from c by following an i?-path. Intuitively, neighborhood formula N binds the 
variables vq, ■ ■ ■ ,Vn to nodes that form a subgraph, and tp defines more constraints on 
those nodes. 



* In all our examples, a neighborhood formula A'^ used in a pattern is such that Bn (the Gaifman 
graph of N) is connected. 



We use let expressions to specify the scope in which the pattern is declared: 

letpi{vo) = Ni{vo,vi, ...,Vn)^ 'tpiivo, ...,Vn)mip 

This allows us to write more concise formulas via sharing of patterns. 

Shorthands We use c[R]p to denote a reachability constraint. Intuitively, the reachabil- 
ity constraint requires that every node that is reachable from c by following an i?-path 
satisfy the pattern p. 

We use ci[i?]->C2 to denote let p{vq) = {true => ->(wo = C2)) in ci[i?]p. In this 
simple case, the neighborhood is only the node assigned to vq- Intuitively, ci[R]^C2 
means that the node labelled by constant C2 is not reachable along an i?-path from 
the node labelled by ci. We use ci{R)c2 as a shorthand for -i(ci[i?]-iC2). Intuitively, 
ci{R)c2 means that there exists an i?-path from ci to C2- We use ci = C2 to denote 
Ci(e)c2, and ci ^ C2 to denote -i(ci = C2). We use c[R]{pi Ap2) to denote {c[R]pi) A 
{c[R]p2), when pi and p2 agree on the central node variable. When two patterns are 
often used together, we introduce a name for their conjunction (instead of naming each 
one separately): let p{vo) = {Ni ^ -tpi) A (A^2 => 1P2) in if. 

In routing expressions, we use U to denote | ^ | . . . | ^), the union of all the 
fields in F. For example, Ci [Z'*]-iC2 means that C2 is not reachable from Ci by any path. 
Finally, we sometimes omit the concatenation operator "." in routing expressions. 

Semantics An interpretation for an LRP formula over t = (C, U, F) is a labelled 
directed graph G = {V'~^, E"-^ , C*^, U'~^) where: (i) V'~^ is a set of nodes modelling the 
heap objects, (ii) : F ^ V{V^ x V^) are labelled edges, (iii) : C ^ 
provides interpretation of constants as unique labels on the nodes of the graph, and 
(iv) U'^ : U — » V{V'^) maps unary relation symbols to the set of nodes in which they 
hold. 

We say that node v & Gis labelled with a if tr G C and v = (a) or a G C/ and 
V € U'^{a). In the rest of the paper, graph denotes a directed labelled graph, in which 
nodes are labelled by constant and unary relation symbols, and edges are labelled by 
binary relation symbols, as defined above. 

We define a satisfaction relation \= between a graph G and LRP formula (G \= (p) 
similarly to the usual semantics the first-order logic with transitive closure over graphs 
(see [36]). 

2.2 Properties of L/?P 

LRP with arbitrary patterns has a finite model property. If formula 6 LRP has an 
infinite model, each reachability constraint in ip that is satisfied by this model has a 
finite witness. 

Theorem 1. (Finite Model Property) Every satisfiable LRP formula is satisfiable by 

a finite graph. 

Sketch of Proof: We show that LRP can be translated into a fragment of an infinitary 
logic that has a finite model property. Observe that c[R\p is equivalent to an infinite 



conjunction of universal first-order sentences. Therefore, if G is a model of c[R]p then 
every substructure of G is also its model. Dually, -^c[R]p is equivalent to an infinite 
disjunction of existential first-order sentences. Therefore, if G is a model of -^c[R]p, 
then G has a finite substructure G' such that every substructure of G that contains G' is 
a model of -^c[R]p. It follows that every satisfiable boolean combination of formulas of 
the form c[R]p has a finite model. Thus, LRP has a finite model property. 

The logic LRP is undecidable. The proof uses a reduction from the halting problem 
of a Turing machine. 

Theorem 2. (Undecidability) The satisfiability problem of LRP formulas is undecid- 
able. 

Sketch of Proof: Given a Turing machine M, we construct a formula i^m such that i^m 
is satisfiable if and only if the execution of M eventually halts. 

The idea is that each node in the graph that satisfies ipM describes a cell of a tape 
in some configuration, with unary relation symbols encoding the symbol in each cell, 
the location of the head and the current state. The n-edges describe the sequence of 
cells in a configuration and a sequence of configurations. The 6-edges describe how the 
cell is changed from one configuration to the next. The constant ci marks the node that 
describes the first cell of the tape in the first configuration, the constant C2 marks the 
node that describes the first cell in the second configuration, and the constant C3 marks 
the node that describes the last cell in the last configuration (see sketch in Fig. 1). 



The most interesting part of the formula lpm ensures that all graphs that satisfy 
i^M have a grid-like form. It states that for every node v that is n-reachable from ci, 
if there is a 6-edge from v to u, then there is a &-edge from the n-successor of v to the 
n-successor of u: 



let piv) = (u A u) A (w A vi) A (u A m) ^ {vi ^ ui) in ci[(A)*]p (1) 



Remark. The reduction uses only two binary relation symbols and a fixed number of 
unary relation symbols. It can be modified to show that the logic with three binary 
relation symbols (and no unary relations) is undecidable. 

3 The LRP2 Fragment and its Usefulness 

In this section we define the LRP2 fragment of LRP, by syntactically restricting the 
patterns. The main idea is to limit the distance between the nodes in the pattern in 
certain situations. 




Fig. 1. sketch of a model. 



Definition 3. A formula is in LRP^ if in every reachability constraint c[R\p, with a 
pattern p{vo) = N{vo, . . . , V(^Oi • • • ) Vn), V' has one of the following forms: 

- (equality pattern) ijj is a an equality between variables Vi = Vj, where < i,j < 
n, and the distance between and vj in N is at most 2 (distance is defined in 
Def 1), 

- (edge pattern) is of the form Vi — > Vj where f & F and < i,j < n, and the 

distance between Vi and vj in N is at most 1. 

- (negative pattern) atomic formulas appear only negatively in ijj. 

Remark. Note that formula (1), which is used in the proof of undecidability in Theo- 
rem 2, is not in LRP2, because p is an edge pattern with distance 3 between vi and ui, 
while LRP2 allows edge patterns with distance at most 1. 

3.1 Describing Linked Data-Structures 

In this section, we show that LRP2 can express properties of data- structures. Table 1 
lists some useful patterns and their meanings. For example, the first pattern detf means 
that there is at most one outgoing /-edge from a node. Another important pattern uns / 
means that a node has at most one incoming /-edge. We use the subscript / to empha- 
size that this definition is parametric in /. 



Pattern Name 


Pattern Definition 


Meaning 


detf{vo) 


{vq vi) A {vo V2) =^ {vi = V2) 


/-edge from Vo is deterministic 


uns f {vo ) 


{Vl —> Vo) A {V2 Wo) => {vi = V2) 


f is not heap-shared by /-edges 


unsf^givo) 


(vi ^vo)A{v2-^ va) => false 


Vo is not heap-shared by /-edge and .g-edge 




(Vo ^ Vl ^ Vl Vo) 

A {vo -^vi^vi^ Vo) 


edges / and b form a doubly-linked 
Ust between vq and vi 


samef^g{vo) 


{vo -^Vi^Vo-^ Vl) 

A {vo -^vi^vo^ Vl) 


edges / and g emanating from vq are 
parallel 



Table 1. Useful pattern definitions (/, b,g G F aie edge labels). 



Well-formed heaps We assume that C (the set of constant symbols) contains a constant 
for each pointer variable in the program (denoted by x, y in our examples). Also, C 
contains a designated constant null that represents NULL values. Throughout the rest 
of the paper we assume that all the graphs denote weU-formed heaps, i.e., the fields of 
all objects reachable from constants are deterministic, and dereferencing NULL yields 
null. In LRP2 this is expressed by the formula: 

(/\ /\ c[E*]detf) A ( /\ null{-^)null) (2) 



Name 


Formula 


reachx,f,y 


the heap object pointed-to by y is reachable from the heap object pointed-to by x. 


cyclicxj 


x{{^)+)x 

cyclicity: the heap object pointed-to by x is located on a cycle. 


unsharedxj 


x[(^)*]unsf 

every heap object reachable irom a; by an /-path has at most one mcommg /-edge. 


disjointx,f,y,g 


xii^n^Yhy 

disjointness: there is no heap object that is reachable from x by an /-path 
and also reachable from y by a g-path. 


samexj,g 


x[{^ 1 -^)*]samef,g 

the /-path and the g-path from x are parallel, and traverse same objects. 


inversex,f,b,v 


reachxj,y Aa;[(^ .^y)*]invf^b 

doubly- Unked lists between two variables x and y 

with / and b as forward and backward edges. 




root[{^ -^)*]{unsi^r A unsi A unSr) A -'{root{{-^ \ -^)*)root) 
tree rooted at root. 



Table 2. Properties of data- structures expressed in LRP2. 



Using the patterns in Table 1, Table 2 defines some interesting properties of data- 
structures using LRP2- The formula reachxj\y means that the object pointed-to by 
the program variable y is reachable from the object pointed-to by the program vari- 
able X by following an access path of / field pointers. We can also use it with null 
in the place of y. For example, the formula reachx.j.nuii describes a (possibly empty) 
Unked-Ust pointed-to by x. Note that it imphes that the hst is acyclic, because null is 
always a "sink" node in a well-formed heap. We can also express that there are no in- 
coming /-edges into the list pointed to by x, by conjoining the previous formula with 
unsharedxj. Alternatively, we can specify that x is located on a cycle of /-edges: 
cyclicxj. Disjointness can be expressed by the formula disjointx,f,y,g that uses both 
forward and backward traversal of edges in the routing expression. For example, we 
can express that the linked hst pointed to by x is disjoint from the linked-list pointed to 
by y, using the formula disjointx,f,yj. Disjointness of data- structures is important for 
parallelization (e.g., see [17]). 

The last two examples in Table 2 specify data- structures with multiple fields. The 
formula inversexj,b,y describes a doubly-Unked with variables x and y pointing to the 
head and the tail of the list, respectively. First, it guarantees the existence of an /-path. 
Next, it uses the pattern invffi to express that if there is an /-edge from one node to 
another, then there is a 6-edge in the opposite direction. This pattern is applied to all 
nodes on the /-path that starts from x and that does not visit y, expressed using the test 
"-•y" in the routing expression. The formula treeroot,r,i describes a binary tree. The 
first part requires that the nodes reachable from the root (by following any path of I and 



r fields) be not heap-shared. The second part prevents edges from pointing back to the 
root of the tree by forbidding the root to participate in a cycle. 

3.2 Expressing Verification Conditions 

The reverse procedure shown in Fig. 2 performs in-place reversal of a singly-linked 
hst. This procedure is interesting because it destructively updates the hst and requires 
two fields to express partial correctness. Moreover, it manipulates linked hsts in which 
each list node can be pointed-to from the outside. In this section, we show that the 
verification conditions for the procedure reverse can be expressed in LRP2. If the 
verification conditions are vaUd, then the program is partially correct with respect to the 
specification. The validity of the verification conditions can be checked automatically 
because the logic LRP2 is decidable, as shown in the next section. In [37], we show how 
to automatically generate verification conditions in LRP2 for arbitrary procedures that 
are annotated with preconditions, postconditions, and loop invariants in LRP2. 

Node reverse (Node x) { 

LO: Node y = NULL; 

LI: while (x != NULL) { 
L2 : Node t = x->n; 
L3 : x->n = y; 
L4 : y = x; 
L5: X = t; 

L6: } 

L7 : return y; 

} 



Fig. 2. Reverse. 

Notice that in this section we assume that all graphs denote valid stores, i.e., sat- 
isfy (2). The precondition requires that x point to an acyclic list, on entry to the pro- 
cedure. We use the symbols and n° to record the values of the variable x and the 
n-field on entry to the procedure. 

pre = x°(i^)*)null° 

The postcondition ensures that the result is an acyclic list pointed-to by y. Most impor- 
tantly, it ensures that each edge of the original list is reversed in the returned list, which 
is expressed in a similar way to a doubly-linked hst, using inverse formula. We use the 
relation symbols and nJ to refer to the values on exit. 

7 

post = y'^ {(^)*)nulf A muerse2,o^„o^„7^y7 



The loop invariant ^ shown below relates the heap on entry to the procedure to the 
heap at the beginning of each loop iteration (label LI). First, we require that the part 
of the Ust reachable from x be the same as it was on entry to reverse. Second, the 
list reachable from y is reversed from its initial state. Finally, the only original edge 
outgoing of y is to x. 

ip = same^i.„o „i A mwerse^o „o „i j,i A x {^)y 

Note that the postcondition uses two binary relations, n° and n'^, and also the loop 
invariant uses two binary relations, nP and n^. This illustrates that reasoning about 
singly-linked lists requires more than one binary relation. 

The verification condition of reverse consists of two parts, VCioop and VC, 
explained below. 

The formula VCioop expresses the fact that ip is indeed a loop invariant. To express 
it in our logic, we use several copies of the vocabulary, one for each program point. 
Different copies of the relation symbol n in the graph model values of the field n at 
different program points. Similarly, for constants. For example. Fig. 3 shows a graph 
that satisfies the formula VCioop below. It models the heap at the end of some loop 
iteration of reverse. The superscripts of the symbol names denote the corresponding 
program points. 



X" 
o - 




Fig. 3. An example graph that satisfies the VCioop formula for reverse. 



To show that the loop invariant if is maintained after executing the loop body, we 
assume that the loop condition and the loop invariant hold at the begirming of the itera- 
tion, and show that the loop body was executed without performing a null-dereference, 
and the loop invariant holds at the end of the loop body: 

VCioop = {x ^ null) loop is entered 

A(p loop invariant holds on loop head 

A{y^ = x^) Ax'^{n^)x^ Ax^{rfi)y^ loop body 

AsamCyi A samex^ ^n<^ rest of the heap remains unchanged 

=^ (x^ 7^ null) no null-derefernce in the body 

Aif^ loop invariant after executing loop body 

Here, ip^ denotes the loop-invariant formula <f after executing the loop body (label L 6), 
i.e., replacing all occurrences of x^, and in Lp by x^, y^ and n^, respectively. The 
formula VCioop defines a relation between three states: on entry to the procedure, at the 
begirming of a loop iteration and at the end of a loop iteration. 



The formula VC expresses the fact that if the precondition holds and the execution 
reaches procedure's exit (i.e., the loop is not entered because the loop condition does 
not hold), the postcondition holds on exit: VC = pre A {x^ = null) ^ post. 

4 Decidability of L/fi'a 

In this section, we show that LRP2 is decidable for validity and satisfiability. Since LRP2 
is closed under negation, it is sufficient to show that it is decidable for satisfiability. 
The satisfiability problem for LRP2 is decidable. The proof proceeds as follows: 

1. Every formula (p G LRP2 can be translated into an equi-satisfiable normal-form 
formula that is a disjunction of formulas in CLRP2 (Def. 4 and Theorem 3). It is 
sufficient to show that the satisfiability of CLRP2 is decidable. 

2. Define a class of simple graphs Ak, for which the Gaifman graph is a tree with at 
most k additional edges (Def. 5). 

3. Show that if formula (p G CLRP2 has a model, (p has a model in Ak, where k is 
linear in the size of the formula cp (Theorem 4). This is the main part of the proof. 

4. Translate formula p G CLRP2 into an equivalent MSO formula. 

5. Show that the satisfiability of MSO logic over Ak is decidable, by reduction to 
MSO on trees [30]. We could have also shown decidability using the fact that the 
tree width of all graphs in Ak is bounded by k, and that MSO over graphs with 
bounded tree width is decidable [11,1, 35]. 

Definition 4. (Normal-Form Formulas) A formula in CLRP2 is a conjunction of reach- 
ability constraints of the form ci {R)c2 <^nd c[H\p, where p is one of the patterns allowed 
in LRP2 (Def. 3). A normal-form formula is a disjunction of CLRP 2 formulas. 

Theorem 3. There is a computable translation from LRP2 to a disjunction of formulas 
in CLRP2 that preserves satisfiability. 

Ayah Graphs We define a notion of a simple tree-Uke directed graph, called Ayah 
graph. 

Let Q{S) denote the Gaifman graph of the graph S, i.e., an undirected graph ob- 
tained from S by removing node labels, edge labels, and edge directions (and parallel 
edges). The distance between nodes vi and V2 in S is the number of edges on the short- 
est path between vi and V2 in G{S). An undirected graph B is in T'^ if removing self 
loops and at most k additional edges from B results in an acychc graph. 

Definition 5. For k >0, an Ayah graph ofk is a graph S for which the Gaifman graph 
is in TK- Ak = {S\giS) G T''}. 

Let p G CLRP2 be of the form <^o A A (p= A (p^, where (p^ is a conjunction 
of constraints of the form ci {R)c2, pa is a conjunction of reachability constraints with 
negative patterns, p= is a conjunction of reachability constraints with equahty patterns, 
and (p^ is a conjunction of reachabiUty constraints with edge patterns. 



Theorem 4. If if G CLRP2 is satisfiable, then ip is satisfiable by a graph in Ak, where 
k = 2 X n X \C\ X m, m is the number of constraints in ip<y, \C\ is the number of 
constants in the vocabulary, and for every regular expression that appears in (fo there 
is an equivalent automaton with at most n states. 

Sketch of Proof: Let 5 be a model of (p : S \= We construct a graph S' from S and 
show that S' \= ip and S' € Ak- The construction uses the following operations on 
graphs. 

Witness Splitting A witness W for a formula ci{R)c2 in CLRP2 in a graph S* is a 
path in S, labelled with a word w e L{R), from the node labelled with ci to the node 
labelled with C2. Note that the nodes and edges on a witness path for R need not be 
distinct. Using W, we construct a graph W' that consists of a path, labelled with w, 
that starts at the node labelled by ci and ends at the node labelled by C2. Intuitively, we 
duplicate a node of W each time the witness path for R traverses it, unless the node is 
marked with a constant. As a result, all shared nodes in W are labelled with constants. 
Also, every cycle contains a node labelled with a constant. By construction, we get that 
W \= ci {R)c2. We say that W is the result of splitting the witness W. 

Finally, we say that W is the shortest witness for ci{R)c2 if any other witness path 
for cx{R)c2 is at least as long as W. The result of sphtting the shortest witness is a 
graph in Ak, where fc = 2xnx|C|:to break all cycles it is sufficient to remove all 
the edges adjacent to nodes labelled with constants, and a node labelled with a constant 
is visited at most n times. (If a node is visited more than once in the same state of the 
automaton, the path can be shortened.) 

Merge Operation Merging two nodes in a graph is defined in the usual way by gluing 
these nodes. Letp(t;o) = N{vq, vi,V2) {vi = V2) be an equality pattern. If a graph 
violates a reachability constraint c[R]p, we can assign nodes no, ni, and 712 to vq, vi, 
and V2, respectively, such that there is a i?-path from c to vq, N{no, ni, 712) holds, and 
ni and n2 are distinct nodes. In this case, we say that merge operation of ni and n2 
is enabled (by c[R]p). The nodes ni and n2 can be merge to discharge this assignment 
(other merge operations might still be enabled after merging rii and n2). 

Edge-Addition Operation Letp(wo) N{vq,vi,v2) I'l ^ W2 be an edge pattern. 
If a graph violates a reachability constraint c[R]p, we can assign nodes no, ni, and n2 
to vo, vi, and V2, respectively, such that there is a i?-path from c to vo, N{no, ni, n2) 
holds, and there is no /-edge from m to n2. In this case, we say that edge-operation 
operation is enabled (by c[R]p). We can add an /-edge from m and n2 to discharge 
this assignment. 

The following lemma is the key observation of this proof. 

Lemma 1. The class of Ak graphs is closed under merge operations of nodes in dis- 
tance at most two and edge-addition operations at distance one. 
Sketch of Proof: If an edge is added in parallel to an existing one (distance one), it does 
not affect the Gaifman graph, thus Ak is closed under edge-addition. The proof that Ak 
is closed under merge operations is more subtle [36]. 



In particular, the class Ak is closed under the merge and edge-addition operations forced 
by LRP2 formulas. This is the only place in our proof where we use the distance restric- 
tion of LRP2 patterns. 

Given a graph S that satisfies (f, we construct the graph S' as follows: 

1. For each constraint i in 9?o, identify the shortest witness Wi in S. Let W- be the 
result of splitting the witness Wi. 

2. The graph 6*0 is a union of all Wj"s, in which the nodes labelled with the (syntacti- 
cally) same constants are merged. 

3. Apply all enabled merge operations and all enabled edge-addition operations in 
any order, producing a sequence of distinct graphs So, Si, . . . , Sr, until Sm has no 
enabled operations. 

4. The results" = Sr. 

The process described above terminates after a finite number of steps, because in each 
step either the number of nodes in the graph is decreased (by merge operations) or the 
number of edges is increased (by edge-addition operations). 

The proof proceeds by induction on the process described above. Initially, Sq is in 
Ale- By Lemma 1, all Si created in the third step of the construction above are in A^, 
in particular, S' G Ak- 

By construction of 5*0, it contains a witness for each constraint in (p<y, and merge 
and edge-addition operations preserve the witnesses, thus S' satisfies (po- Moreover, 
So satisfies all constraints in ipn . We show that merge and edge-addition operations 
applied in the construction preserve (po constraints, thus S' satisfies (po- The process 
above terminates when no merge and edge-addition operations are enabled, that is, S' 
satisfies <^= A <p^. Thus, S' satisfies (p. 

The fuU proof is available at [36]. 

4.1 Complexity 

We proved decidabiUty by reduction to MSO on trees, which allows us to decide LRP2 
formulas using MONA decision procedure [18]. Alternatively, a decision procedure for 
LRP2 can directly construct a tree automaton from a normal-form formula, and can 
then check emptiness of the automaton. The worst case complexity of the satisfiability 
problem of LRP2 formulas is at least doubly-exponential, but it remains elementary (in 
contrast to MSO on trees, which is non-elementary); we are investigating tighter upper 
and lower bounds. The complexity depends on the bound k of Ak models, according 
to Theorem 4. If the routing expressions do not contain constant symbols, then the 
bound k does not depend on the routing expressions: it depends only on the number 
of reachability constraints of the form ci{R)c2. The LRP2 formulas that come up in 
practice are well- structured, and we hope to achieve a reasonable performance. 

5 Limitations and Further Extensions 

Despite the fact that LRP2 is useful, there are interesting program properties that cannot 
be expressed. For example, transitivity of a binary relation, that can be used, e.g., to ex- 
press partial orders, is naturally expressible in LRP, but not in LRP2. Also, the property 



that a general graph is a tree in which each node has a pointer back to the root is ex- 
pressible in LRP, but not in LRP2. Notice that the property is non-trivial because we are 
operating on general graphs, and not just trees. Operating on general graphs allows us 
to verify that the data- structure invariant is reestablished after a sequence of low-level 
mutations that temporarily violate the invariant data-structure. 

There are of course interesting properties that are beyond LRP, such as the property 
that a general graph is a tree in which every leaf has a pointer to the root of a tree. 

In the future, we plan to generalize LRP2 while maintaining decidability, perhaps 
beyond LRP. We are encouraged by the fact that the proof of decidability in Section 4 
holds "as is" for many useful extensions. For example, we can generalize the patterns 
to allow neighborhood formulas with disjunctions and negations of unary relations. In 
fact, more complex patterns can be used, as long as they do not violate the Ak prop- 
erty. For example, we can define trees rooted at x with parent pointer b from every tree 

node to its parent by treex,r,i,b A letp{vo) = {{vi —y vq) V (ui —>■ vo)) (wo — * 

wi)in x[{-^ I -^)*]{detii hp). The extended logic remains decidable, because the pat- 
tern p adds edges only in parallel to the existing ones. 

Currently, reachability constraints describe paths that start from nodes labelled by 
constants. We can show that the logic remains decidable when reachabiUty constraints 
are generalized to describe paths that start from any node that satisfies a quantifier- 
free positive formula 9: Vv, wq, . . . , Wm, I'O) • • • > Vn-R{v, vq) A 9{v, wq, • ■ • , Wm) =^ 

(iV(W0, ...,Vn)^ ll>{V0, Vn)). 

6 Related Work 

There are several works on logic-based frameworks for reasoning about graph/heap 
structures. We mention here the ones which are, as far as we know, the closest to ours. 

The logic LRP can be seen as a fragment of the first-order logic over graph structures 
with transitive closure (TC logic [20]). It is well known that TC is undecidable, and that 
this fact holds even when transitive closure is added to simple fragments of FO such as 
the decidable fragment L'^ of formulas with two variables [29, 15, 13]. 

It can be seen that our logics LRP and LRP2 are both uncomparable with + 
TC. Indeed, in LRP no alternation between universal and existential quantification is 
allowed. On the other hand, LRP2 allows us to express patterns (e.g., heap sharing) that 
require more than two variables (see Table 1, Section 3). 

In [3], decidable logic Lr (which can also be seen as a fragment of TC) is intro- 
duced. The logics LRP and LRP2 generalize L, , which is in fact the fragment of these 
logics where only two fixed patterns are allowed: equahty to a program variable and 
heap sharing. 

In [21,2,26,4] other decidable logics are defined, but their expressive power is 
rather limited w.r.t. LRP2 since they allow at most one binary relation symbol (mod- 
elUng linked data- structures with 1-selector). For instance, the logic of [21] does not 
allow us to express the reversal of a list. Concerning the class of 1 -selector linked data- 
structures, [6] provides a decision procedure for a logic with reachability constraints 
and arithmetical constraints on lengths of segments in the structure. It is not clear how 



the proposed techniques can be generalized to larger classes of graphs. Other decidable 
logics [7, 25] are restricted in the sharing patterns and the reachability they can describe. 

Other works in the Uterature consider extensions of the first-order logic with fix- 
point operators. Such an extension is again undecidable in general but the introduction 
of the notion of (loosely) guarded quantification allows one to obtain decidable frag- 
ments such as fiGF (or fxLGF) (Guarded Fragment with least and greater fixpoint op- 
erators) [14, 12]. Similarly to our logics, the logic fiGF (and also fxLGF) has the tree 
model property: every satisfiable formula has a model of bounded tree width. However, 
guarded fixpoint logics are incomparable with LRP and LRP^ - For instance, the LRP2 
pattern detf that requires determinism of /-field, is not a (loosely) guarded formula. 

The PALE system [28] uses an extension of the monadic second order logic on 
trees as a specification language. The considered linked data structures are those that 
can be defined as graph types [24]. Basically, they are graphs that can be defined as 
trees augmented by a set of edges defined using routing expressions (regular expres- 
sions) defining paths in the (undirected structure of the) tree. LRP2 allows us to reason 
naturally about arbitrary graphs without limitation to tree-like structures. Moreover, as 
we show in Section 3, our logical framework allows us to express postconditions and 
loop invariants that relate the input and the output state. For instance, even in the case 
of singly- linked lists, our framework allows us to express properties that cannot be ex- 
pressed in the PALE framework: in the list reversal example of Section 3, we show that 
the output Ust is precisely the reversed input list, whereas in the PALE approach, one 
can only establish that the output is a list that is the permutation of the input list. 

In [22], we tried to employ a decision procedure for MSO on trees to reason about 
reachabiUty. However, this places a heavy burden on the specifier to prove that the data- 
structures in the program can be simulated using trees. The current paper eliminated 
this burden by defining syntactic restrictions on the formulas and showing a general 
reduction theorem. 

Other approaches in the literature use undecidable formalisms such as [17], which 
provides a natural and expressive language, but does not allow for automatic property 
checking. 

Separation logic has been introduced recently as a formalism for reasoning about 
heap structures [32]. The general logic is undecidable [10] but there are few works 
showing decidable fragments [10,4]. One of the fragments is propositional separation 
logic where quantification is forbidden [10, 9] and therefore seems to be incomparable 
with our logic. The fragment defined in [4] allows one to reason only about singly- 
linked Usts with expUcit sharing. In fact, the fragment considered in [4] can be translated 
to LRP2-, and therefore, entailment problems as stated in [4] can be solved as implication 
problems in LRP2. 

7 Conclusions 

Defining decidable fragments of first order logic with transitive closure that are useful 
for program verification is a difficult task (e.g., [21]). In this paper, we demonstrated 
that this is possible by combining three principles: (i) allowing arbitrary boolean com- 
binations of the reachability constraints, which are closed formulas without quantifier 



alternations, (ii) defining reachability using regular expressions denoting pointer access 
paths (not) reaching a certain pattern, and (iii) syntactically limiting the way patterns 
are formed. Extensions of the patterns that allow larger distances between nodes in the 
pattern either break our proof of decidability or are directly undecidable. 

The decidability result presented in this paper improves the state-of-the-art signifi- 
cantly. In contrast to [21, 2, 26,4], LRP allows several binary relations. This provides a 
natural way to (i) specify invariants for data- structures with multiple fields (e.g., trees, 
doubly-Unked lists), (ii) specify post-condition for procedures that mutate pointer fields 
of data-structures, by expressing the relationships between fields before and after the 
procedure (e.g., Ust reversal, which is beyond the scope of PALE), (iii) express verifi- 
cation conditions using a copy of the vocabulary for each program location. 

We are encouraged by the expressiveness of this simple logic and plan to explore its 
usage for program verification and abstract interpretation. 
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